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Abstract 

Background: Soil is among the most diverse and complex environments in the world. Soil microorganisms play an 
essential role in biogeochemical cycles and affect plant growth and crop production. However, our knowledge of 
the relationship between species-assemblies and soil ecosystem processes is still very limited. The aim of this study 
was to generate a comprehensive metagenomic survey to evaluate the effect of high-input agricultural practices on 
soil microbial communities. 

Results: We collected soil samples from three different areas in the Argentinean Pampean region under three 
different types of land uses and two soil sources (bulk and rhizospheric). We extracted total DNA from all samples 
and also synthetized cDNA from rhizospheric samples. Using 454-FLX technology, we generated 112 16S ribosomal 
DNA and 14 16S ribosomal RNA amplicon libraries totaling 1.3 M reads and 36 shotgun metagenome libraries 
totaling 17.8 million reads (7.7 GB). Our preliminary results suggested that water availability could be the primary 
driver that defined microbial assemblages over land use and soil source. However, when water was not a limiting 
resource (annual precipitation >800 mm) land use was a primary driver. 

Conclusion: This was the first metagenomic study of soil conducted in Argentina and our datasets are among the 
few large soil datasets publicly available. The detailed analysis of these data will provide a step forward in our 
understanding of how soil microbiomes respond to high-input agricultural systems, and they will serve as a useful 
comparison with other soil metagenomic studies worldwide. 
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Background 

The Argentine Pampas is a plain area of 60 million ha. 
Because of its large expanse and high yields, it is one of 
the most productive areas for grain crop production in 
the world [1]. Indeed, 90% of the pampean surface is 
currently used for high-input agricultural purposes. 
Argentina is currently the third and fourth world produ- 
cer of soybean and maize, respectively [2], This produc- 
tion is mostly concentrated in the pampean region. 
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Since 1980, agriculture has rapidly expanded in the re- 
gion, replacing grasslands, with the widespread adoption 
of limited tillage systems, particularly no-till with crop 
rotation [3]. These practices have been reported to pre- 
serve surface water, prevent soil erosion and return nu- 
trients to soil [4-6]. However, concerns remain regarding 
the impact of these practices on soil quality, microbial 
diversity and community assemblages. 

Changes in microbial communities throughout the 
Argentine Pampas are poorly reported. Most studies 
have focused on the tillage effects on microbial biomass 
or specific microbial activities such as the utilization of 
specific substrates, extracellular enzyme production, or 
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mineralization [7-9]. Other studies have focused on well- 
studied and particular bacterial taxa rather than the 
microbial community structure itself [10,11]. Studies 
conducted with an ecological approach have usually fo- 
cused on the individual effects of land use such as the ap- 
plication of herbicides [12,13]. In such cases, community 
variability was assessed using classical fingerprinting tech- 
niques (such as RFLP and DGGE), which only capture the 
most dominant species in the environment [14,15]. In this 
regard, classical approaches are inadequate for describing 
highly diverse soil microbial communities. 

High-throughput sequencing (HTS) has opened a new 
era for environmental microbial studies as large amounts 
of genetic information can be obtained without culturing. 
Some recent studies have used amplicon and shotgun 
metagenome pyrosequencing to characterize soil micro- 
bial communities worldwide [16-20]. These strategies have 
allowed a more exhaustive characterization of community 
patterns, composition and metabolic capabilities, and con- 
tinue to change our understanding of the microbial world. 
To date, however, HTS approaches have not been 
employed in Argentina as a means to compare tillage sys- 
tems and evaluate land use effects on soil microbial 
communities. 

In this study, we examined the impact of agricultural 
management on soil microbial communities. To do so, 
we collected soil samples from sites under three different 
types of land use (conventional tillage, no till and no 
agriculture), at each of five different locations in the 
Argentine Pampas region (Figure 1). From these sam- 
ples, we generated amplicon and shotgun metagenome 
libraries, which were subsequently sequenced using 454- 
FLX pyrosequencing. Together these data compose the 
designated PAMPA datasets. 

Methods 

Soil samples were obtained at five different sites in the 
Argentinean Pampas located in three isohyet regions 
(Figure 1): three production fields in the rolling pampas 
(La Estrella: LE, La Negrita: LN, Criadero Klein: CK, wet 
weather, 1,000 to 1,200 annual mm) and two experimen- 
tal stations, at Balcarce (Ba, semi-wet, 800 to 1,000 an- 
nual mm) and Anguil (An, semi-arid, 600 to 800 annual 
mm). At each experimental station, soils were collected 
from three plots, with three different types of land use: 
conventional tillage (CT), no till (NT) and soils with no 
agricultural (NA) management. Bulk soil was obtained 
from all plots included in this study. In addition, wheat 
rhizospheric soil was also obtained from the Anguil 
CT and NT plots. Only one sampling campaign was 
performed at each site, except at the La Estrella produc- 
tion field in the rolling pampas where there were six 
sampling time points over a year. At least two independ- 
ent soil samples from each plot and land use site were 



collected, resulting in a total of 30 samples for Anguil sta- 
tion, 20 for Balcarce station and 62 for the rolling pampas 
region (see Additional files 1 and 2 for a detailed descrip- 
tion of sampling strategy and sample processing). Total 
DNA was prepared from all soil samples. In addition, total 
cDNA was also prepared from Anguil rhizospheric samples. 
Amplicon sequencing libraries were constructed by PCR 
amplification of the V4 variable region in the 16S rRNA 
gene. Shotgun metagenome libraries were also constructed 
from one genomic DNA (gDNA) (and one cDNA, when 
available) sample obtained from each plot (see Additional 
files 1 and 2 for further details). Amplicon and shotgun li- 
braries were sequenced using 454-FLX-Titanium chemistry. 
Raw data processing was performed following standard 
procedures suggested by the manufacturer. 

We obtained a total of 19,325,913 reads and 7,740, 
811,541 bases from 30 samples by 454-FLX shotgun 
metagenome sequencing and 1,051,470 16S ribosomal 
DNA and ribosomal RNA (rDNA/rRNA) reads from 126 
samples by amplicon sequencing. The metatranscriptomic 
shotgun libraries were excluded from the analysis due to 
the low number of reads recovered after rRNA trimming 
(more than ten fold below other samples). The amplicon 
dataset was analyzed using QIIME vl.5 software package 
[21]. Shotgun metagenome datasets were annotated by 
BLAST against the NCBI database and subsequent results 
imported into MEGAN [22] for further analysis. Numer- 
ical and statistical analyses were performed using the 
METAGENassist software [23] and the R packages 
'BiodiversityR' and Vegan' (R Development Core Team) 
(see Additional file 1). 

Quality assurance 

To rule out possible contaminants from non-microbial 
species, such as plant, human or any other allochthon- 
ous DNA, in our metagenome shotgun libraries, a tax- 
onomy assignment of all reads was assessed. We 
performed peptide prediction using FragGeneScan [24] 
followed by BlastP annotation against the NCBI Data- 
base. The Blast output was analyzed using MEGAN [22] . 
The results showed that 95% of the classified sequences 
were identified as Bacteria, 1% as Eukarya and 0.6% as 
Archaea, whereas the remaining 3.4% of sequences could 
not be classified above the cellular organism level (data 
not shown). Within the Eukarya, 42% of reads were clas- 
sified as Viridiplantae (plants), 27% as Fungi, 12% as 
Metazoa, 6% as diatoms and 13% to other groups or 
could not be classified (data not shown). Plant sequences 
are likely to be from decomposing material. These re- 
sults suggest that contamination with allochthonous 
DNA is minimal or nonexistent as we could not identify 
any genetic material from unexpected species in the soils 
(for example, humans). 
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Figure 1 Sampling sites and experimental design for PAMPA datasets. The geographic location of the Argentinean Pampas is marked in 
grey on the map of South America. The isohyets in the region are shown in blue (top left). Soil samples were taken in three different isohyets 
and are indicated with numbers (1: La Estrella, 2: La Negrita, 3: Criadero Klein in the wet rolling pampas region, 4: Balcarce, a semi-wet region, 5: 
Anguil, a semi-arid region). The experimental design is indicated in a table below the map. Soil source, genetic material, land use and sequencing 
method are indicated for each sampling site. The number of replicates per sample analyzed by each sequencing method is shown inside the 
boxes. Additional and detailed information on each type of library per sampling site can be found in Additional file 2: Table S1. gDNA, genomic 
DNA; rDNA, ribosomal DNA; rRNA, ribosomal RNA. 



Initial findings 

We found that geographic-specific differences, possibly 
associated with water availability, were evident in the 
16S rRNA amplicon analysis of 103 soil communities 
(23 samples were excluded from the preliminary analysis 
due to differences in sequencing depth and other biases, 
see Additional files 1 and 2). The semi-arid soils (An) 
harbored communities that clustered separately from the 
wet (LE, LN, CK) and semi-wet (Ba) soil microbial com- 
munities (analysis of similarity: ANOSIM = 0.672, P < 



0.001, Figure 2A, Additional file 3: Figure SI). This ob- 
servation could be explained by the very different envir- 
onmental conditions in both areas: the eastern area (wet 
and semi-wet) is humid and fertile with fine-textured 
soils that are rich in organic matter, while the western 
area is semi-arid with shallow coarse-textured soils with 
low levels of organic matter. We used Bioenv analysis 
(see Additional file 1 for further details of the analysis) 
to test which soil properties best explained the variation 
in microbial community structure. We found that clay, 
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Figure 2 Principal component analysis. (A) A total of 103 soil samples were analyzed by 16S rDNA/rRNA V4 amplicon sequencing. Sequences 
were clustered in OTUs at 90% similarity. Low abundance and infrequent OTUs were excluded from the analysis (see Additional file 1 for a 
detailed description of the filtering procedures). Datasets were normalized before PCA. (B) Differences among 16S rDNA and rRNA were evident 
in the first three axes of the PCA analysis. (C) A total of 30 soil samples were analyzed by metagenomic shotgun sequencing. Predicted peptides 
were annotated by BlastP against the NCBI database and the results assigned to categories in SEED Database. Low abundance and infrequent 
SEED categories were excluded from the analysis (see Additional file 1). Datasets were normalized before PCA. OTU, operational taxonomic unit; 
PCA, principal component analysis; rDNA, ribosomal DNA; rRNA, ribosomal RNA. 



organic matter content, pH and salinity were the most 
influential variables (Mantel test: r = 0.6209, P = 0.001). 

Differences in microbial communities within the semi- 
arid region (An) were largely determined by soil source, 
that is rhizospheric compared to bulk soil (ANOSIM = 
0.5614, P < 0.001, Figure 2A, Additional file 3: Figure SI). 
In addition, rhizospheric samples clustered separately de- 
pending on the type of genetic material amplified 
(ANOSIM = 0.5169, P = 0.001, Figure 2B, Additional file 
3: Figure SI). At the DNA level, active, inactive and even 
dead microorganisms were detected, that is, all the mi- 
crobes present in the sample. However, at the RNA level, 
only metabolically active microorganisms were detected 
due to their high rates of rRNA expression. Our results 
show that rhizospheric microbial signatures detected by 
16S rDNA are clearly distinct from those detected by 16S 
rRNA, suggesting that bacterial activity was not necessar- 
ily correlated with bacterial abundance. 

Land use was another important driver that defined mi- 
crobial community assemblages. Bulk soil samples clus- 
tered separately depending on land use (ANOSIM: Anguil 
= 0.3954, P = 0.017; Balcarce = 0.3795, P = 0.001; rolling 
pampas = 0.2072, P = 0.01, Additional file 3: Figure SI). 
Moreover, samples collected from soils under different till- 
age systems at the two experimental stations (Ba, An) also 
clustered separately in the analysis (ANOSIM: Balcarce = 
0.5476, P = 0.001; Anguil = 0.2652, P = 0.001, Additional 
file 3: Figure SI). These results suggest that different mi- 
crobial communities were selected under each type of soil 
management. 

The evaluation of metabolic categories using the shot- 
gun metagenome libraries also showed that semi-arid 



western locations were different from wet and semi-wet 
eastern sites (ANOSIM = 0.2806, P < 0.001). Therefore, 
we propose that water availability is probably the 
primary driver that shapes microbial communities 
(Figure 2C, Additional file 3: Figure SI). There was also 
clear separation by soil source in western semi-arid 
samples (ANOSIM = 0.6688, P < 0.001, Figure 2C, 
Additional file 3: Figure SI). In addition, bulk soil sam- 
ples clustered separately according to tillage system in 
An and Ba (ANOSIM: Balcarce = 0.5391, P = 0.01; 
Anguil = 0.2346, P = 0.02, Additional file 3: Figure SI). 
However, the latter observation was less defined for 
rhizospheric samples, suggesting that other conditions, 
such as plant phenotype and exudates, could determine 
bacterial populations in rhizospheric communities. The 
soil properties that best explained the functional vari- 
ation between samples for shotgun sequencing analysis 
were silt, organic matter, nitrogen content, pH and salin- 
ity (Mantel test: r = 0.2771, P = 0.002). 

Even though additional work is required, preliminary re- 
sults indicated that differences in microbial communities 
were largely defined by the variables considered, for ex- 
ample, water availability, geographic location, soil source, 
genetic material amplified and land use or tillage system. 
However, this was not always observed at the functional 
metagenomic level, since some samples showed patterns 
different from those in amplicon analysis (Additional file 
3: Figure SI). Differences between the amplicon and shot- 
gun analyses could be due to the fact that the 16S rDNA/ 
rRNA operational taxonomic unit (OTU) analysis was 
performed by clustering sequences based on similarity, 
while the metagenomic analysis was based on sequence 



Rascovan et al. Microbiome 2013, 1:21 
http://www.microbiomejournal.eom/content/1/1/21 



Page 5 of 6 



annotation, constrained by SEED database size, its limited 
number of categories and their ambiguity in sequence 
identity. Nevertheless, we could not rule out the possibility 
that very different microbial species could have similar 
metabolisms, thus minimizing the differences at metabolic 
level 

Future directions 

The present project represents the first large-scale 
metagenomic study of soils in Argentina that explores the 
link between agricultural management and soil 
microbiome. The resulting PAMPA datasets are among 
the few available soil metagenomic datasets based on 
high- throughput sequencing [17] and, here we presented a 
preliminary analysis of our data. While more detailed ana- 
lysis will be needed to test the ideas presented in this 
paper, results so far have shed considerable light on the 
largely unknown soil micro-ecosystem of the Argentine 
Pampas. We showed that the soil microbiome changes 
primarily because of water availability and agricultural 
land use, and that these changes are also linked to differ- 
ent tillage systems (no-till or conventional tillage). 

Additional analysis of the PAMPA datasets will con- 
tinue to expand our knowledge of soil microbiome com- 
position and function. Future efforts will be directed at 
identifying particular species and metabolisms associated 
with each tillage system in each geographic region and 
enriched by the rhizosphere. In addition, the PAMPA 
datasets can also be used in future worldwide soil 
metagenomic projects for comparative purposes. Add- 
itional experimental and sequencing efforts will be 
needed to describe in detail the root-associated microor- 
ganisms for different crops in different conditions. Un- 
derstanding soil microbial dynamics and identifying 
specific plant-interacting microbes will be important 
steps towards improving current agricultural and soil 
sustainability practices. 

Availability of supporting data 

All data are publicly available and can be accessed 
through the Bioproject PRJNA178180 or directly by the 
NCBI Sequence Read Archive (SRA) under the accession 
numbers SRA058523 and SRA056866 (Additional file 2: 
Table SI for detailed information). Additional informa- 
tion to that presented in this paper will be available from 
the Soil Genetic Network (SoilGeNe) website [25]. 

Additional files 



Additional file 1: Supplemental methods. Detailed description of all 
materials and methods used to generate and analyze the PAMPA 
datasets. 

Additional file 2: Table SI. Metadata for all samples analyzed in the 
PAMPA datasets. There is a full list of amplicon and shotgun 



metagenome libraries. Soil types, source of genetic material, sequencing 
strategies, primers and barcodes used, number of sequences obtained, 
physicochemical properties and general metadata for each sample are 
described in detail. 

Additional file 3: Figure SI. Heatmap and beta-diversity analysis for 
amplicon and metagenome shotgun libraries in PAMPA datasets. (A) A 
total of 103 soil samples were analyzed by 16S rDNA/rRNA V4 amplicon 
sequencing. Sequences were clustered in OTUs at 90% similarity. Low 
abundance and infrequent OTUs were excluded from the analysis (see 
Additional file 1 for a detailed description of the filtering procedures). 
Datasets were normalized and compared using the Pearson distance and 
Ward clustering algorithm. The scale bar at the top is expressed 
according to the range of values after normalization. Metadata for each 
sample are indicated by color bars at the right and references are 
indicated at the top. (B) A total of 30 soil samples were analyzed by 
metagenomic shotgun sequencing. Predicted peptides were annotated 
by BlastP against the NCBI database and the results assigned to SEED 
categories. Low abundance and infrequent SEED categories were 
excluded from the analysis (see Additional file 1). Datasets were 
normalized and compared using the Pearson distance and Ward 
clustering algorithm. Metadata are indicated with same references as in 
A. An, Anguil; B, bulk soil; Ba, Balcarce; CK, Criadero Klein; CT, 
conventional tillage; LE, La Estrella; LN, La Negrita; NA, no agriculture; 
NT, no till farming; R, rhizospheric soil; RP, rolling pampas. 
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